A Distance-Based Packing Method for High Dimensional Data

نویسندگان

  • Tae-wan Kim
  • Ki-Joune Li
چکیده

Minkowski-sum cost model indicates that balanced data partitioning is not beneficial for high dimensional data. Thus we study several unbalanced partitioning methods and propose cost models for them based on Minkowski-sum cost model. Our cost models indicate that the distance to one of both ends of data space dominates the expected value under uniform data distribution. We generalize studied methods to adapt to data distribution and propose a new partitioning method, called DD–CSP (Distance-based Distribution–adaptive Cyclic Sliced Partition), for high–dimensional index structures. At each partition, it splits data from lower end or higher end to the center of data space based on distance cost function. Based on this fact, we propose a data structure called DSR(Dimension– independent Single value Representation) which takes constant amount of storage to represent MBHs(Minimum Bounding Hyper–cubes) independent of dimension. In our experimental studies, we compare DD–CSP with R– tree, HP, STR, TGS, and methods analyzed in our paper on real and synthetic data sets with wide ranges of dimensions and of selectivities varying from 10 to 10. In all experiments, we show that our method, DD–CSP, outperforms all other methods and achieves up to 567% savings in response time. Thus it is a clearly winning strategy in terms of range queries and storage requirements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

 Abstract: Packing rectangular shapes into a rectangular space is one of the most important discussions on Cutting & Packing problems (C;P) such as: cutting problem, bin-packing problem and distributor's pallet loading problem, etc. Assume a set of rectangular pieces with specific lengths, widths and utility values. Also assume a rectangular packing space with specific width and length. The obj...

متن کامل

A Comparative Study of Exact Algorithms for the Two Dimensional Strip Packing Problem

In this paper we consider a two dimensional strip packing problem. The problem consists of packing a set of rectangular items in one strip of width W and infinite height. They must be packed without overlapping, parallel to the edge of the strip and we assume that the items are oriented, i.e. they cannot be rotated. To solve this problem, we use three exact methods: a branch and bound method, a...

متن کامل

بررسی تعدادی از روشهای آماری جهت ترسیم نمودار کنترل مشاهدات انفرادی برای فرایندهای اتورگرسیو

  Packing rectangular shapes into a rectangular space is one of the most important discussions on Cutting & Packing problems (C;P) such as: cutting problem, bin-packing problem and distributor's pallet loading problem, etc. Assume a set of rectangular pieces with specific lengths, widths and utility values. Also assume a rectangular packing space with specific width and length. The objective fu...

متن کامل

روشی جدید برای چیدمان قطعات مستطیل شکل در یک فضای مستطیل شکل

 Packing rectangular shapes into a rectangular space is one of the most important discussions on Cutting & Packing problems (C;P) such as: cutting problem, bin-packing problem and distributor's pallet loading problem, etc. Assume a set of rectangular pieces with specific lengths, widths and utility values. Also assume a rectangular packing space with specific width and length. The objective fun...

متن کامل

3D Scene and Object Classification Based on Information Complexity of Depth Data

In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003